Cross-Language Text Filtering Based on Text Concepts and kNN

نویسندگان

  • Weifeng Su
  • Shaozi Li
  • Tanqiu Li
  • Wenjian You
چکیده

This paper presents the model that can be used to filter the texts which the user is interested in from a large scale of source texts in Chinese or in English. Each text which the user is interested in can be represented as a vector in the vector space of classifiable sememes. The text to be sifted is represented as a vector too. The relevance of the text to the user can be measured by using the cosine angle between the text and its k nearest neighbor in the vector space. Experiments have been done and their results show that this scheme yields good results .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Text Categorization for Authorship based on the Features of Lingual Conceptual Expression

The text categorization is an important field for the automatic text information processing. Moreover, the authorship identification of a text can be treated as a special text categorization. This paper adopts the conceptual primitives’ expression based on the Hierarchical Network of Concepts (HNC) theory, which can describe the words meaning in hierarchical symbols, in order to avoid the spars...

متن کامل

Architecture Narration: A Comparative Study on Narration in Architecture and Story

The way architects think about different issues from developing plans, perspectives, and views to cross-sections and structure of a building is a common and general one. Regardless of its merits and efficiency, this way of thinking indicates a degradation in architectural thinking. Indeed, architectures today are caught in a specific architecture language where the boundaries of language create...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCLCLP

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2002